{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "6yPJ1ONIwOnn"
      },
      "source": [
        "# Using lmdeploy for Yi-1.5-6B-Chat Model Inference\n",
        "\n",
        "Welcome to this tutorial! Here, we'll guide you on how to use lmdeploy for inference with the Yi-1.5-6B-Chat model. lmdeploy provides a comprehensive solution for lightweight deployment and service of LLM tasks. Let's get started!"
      ],
      "id": "6yPJ1ONIwOnn"
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "h4DWEghnwOnq"
      },
      "source": [
        "## 🚀 Running on Colab\n",
        "\n",
        "We also provide a [Colab script](https://colab.research.google.com/drive/1q3ROpne6ulkoybBzemeanHY6vNP9ykjV?usp=drive_link) for one-click execution to make development easier!"
      ],
      "id": "h4DWEghnwOnq"
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "k7AGj_M6wOnr"
      },
      "source": [
        "## Installation\n",
        "\n",
        "First, we need to install the necessary dependencies:"
      ],
      "id": "k7AGj_M6wOnr"
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "installation",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "6c2627c4-75f2-4448-aea9-963fe4c0ad86"
      },
      "source": [
        "!pip install lmdeploy"
      ],
      "execution_count": 1,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Collecting lmdeploy\n",
            "  Downloading lmdeploy-0.5.1-cp310-cp310-manylinux2014_x86_64.whl.metadata (15 kB)\n",
            "Requirement already satisfied: accelerate>=0.29.3 in /usr/local/lib/python3.10/dist-packages (from lmdeploy) (0.32.1)\n",
            "Collecting einops (from lmdeploy)\n",
            "  Downloading einops-0.8.0-py3-none-any.whl.metadata (12 kB)\n",
            "Collecting fastapi (from lmdeploy)\n",
            "  Downloading fastapi-0.111.1-py3-none-any.whl.metadata (26 kB)\n",
            "Collecting fire (from lmdeploy)\n",
            "  Downloading fire-0.6.0.tar.gz (88 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m88.4/88.4 kB\u001b[0m \u001b[31m5.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25h  Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
            "Collecting mmengine-lite (from lmdeploy)\n",
            "  Downloading mmengine_lite-0.10.4-py3-none-any.whl.metadata (20 kB)\n",
            "Requirement already satisfied: numpy<2.0.0 in /usr/local/lib/python3.10/dist-packages (from lmdeploy) (1.25.2)\n",
            "Collecting peft<=0.11.1 (from lmdeploy)\n",
            "  Downloading peft-0.11.1-py3-none-any.whl.metadata (13 kB)\n",
            "Requirement already satisfied: pillow in /usr/local/lib/python3.10/dist-packages (from lmdeploy) (9.4.0)\n",
            "Requirement already satisfied: protobuf in /usr/local/lib/python3.10/dist-packages (from lmdeploy) (3.20.3)\n",
            "Requirement already satisfied: pydantic>2.0.0 in /usr/local/lib/python3.10/dist-packages (from lmdeploy) (2.8.2)\n",
            "Collecting pynvml (from lmdeploy)\n",
            "  Downloading pynvml-11.5.3-py3-none-any.whl.metadata (8.8 kB)\n",
            "Requirement already satisfied: safetensors in /usr/local/lib/python3.10/dist-packages (from lmdeploy) (0.4.3)\n",
            "Requirement already satisfied: sentencepiece in /usr/local/lib/python3.10/dist-packages (from lmdeploy) (0.1.99)\n",
            "Collecting shortuuid (from lmdeploy)\n",
            "  Downloading shortuuid-1.0.13-py3-none-any.whl.metadata (5.8 kB)\n",
            "Collecting tiktoken (from lmdeploy)\n",
            "  Downloading tiktoken-0.7.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.6 kB)\n",
            "Collecting torch<=2.2.2,>=2.0.0 (from lmdeploy)\n",
            "  Downloading torch-2.2.2-cp310-cp310-manylinux1_x86_64.whl.metadata (26 kB)\n",
            "Collecting torchvision<=0.17.2,>=0.15.0 (from lmdeploy)\n",
            "  Downloading torchvision-0.17.2-cp310-cp310-manylinux1_x86_64.whl.metadata (6.6 kB)\n",
            "Requirement already satisfied: transformers in /usr/local/lib/python3.10/dist-packages (from lmdeploy) (4.42.4)\n",
            "Collecting uvicorn (from lmdeploy)\n",
            "  Downloading uvicorn-0.30.3-py3-none-any.whl.metadata (6.5 kB)\n",
            "Collecting nvidia-nccl-cu12 (from lmdeploy)\n",
            "  Downloading nvidia_nccl_cu12-2.22.3-py3-none-manylinux2014_x86_64.whl.metadata (1.8 kB)\n",
            "Collecting nvidia-cuda-runtime-cu12 (from lmdeploy)\n",
            "  Downloading nvidia_cuda_runtime_cu12-12.5.82-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)\n",
            "Collecting nvidia-cublas-cu12 (from lmdeploy)\n",
            "  Downloading nvidia_cublas_cu12-12.5.3.2-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)\n",
            "Collecting nvidia-curand-cu12 (from lmdeploy)\n",
            "  Downloading nvidia_curand_cu12-10.3.6.82-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)\n",
            "Collecting triton<=2.2.0,>=2.1.0 (from lmdeploy)\n",
            "  Downloading triton-2.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (1.4 kB)\n",
            "Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from accelerate>=0.29.3->lmdeploy) (24.1)\n",
            "Requirement already satisfied: psutil in /usr/local/lib/python3.10/dist-packages (from accelerate>=0.29.3->lmdeploy) (5.9.5)\n",
            "Requirement already satisfied: pyyaml in /usr/local/lib/python3.10/dist-packages (from accelerate>=0.29.3->lmdeploy) (6.0.1)\n",
            "Requirement already satisfied: huggingface-hub in /usr/local/lib/python3.10/dist-packages (from accelerate>=0.29.3->lmdeploy) (0.23.5)\n",
            "Requirement already satisfied: tqdm in /usr/local/lib/python3.10/dist-packages (from peft<=0.11.1->lmdeploy) (4.66.4)\n",
            "Requirement already satisfied: annotated-types>=0.4.0 in /usr/local/lib/python3.10/dist-packages (from pydantic>2.0.0->lmdeploy) (0.7.0)\n",
            "Requirement already satisfied: pydantic-core==2.20.1 in /usr/local/lib/python3.10/dist-packages (from pydantic>2.0.0->lmdeploy) (2.20.1)\n",
            "Requirement already satisfied: typing-extensions>=4.6.1 in /usr/local/lib/python3.10/dist-packages (from pydantic>2.0.0->lmdeploy) (4.12.2)\n",
            "Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from torch<=2.2.2,>=2.0.0->lmdeploy) (3.15.4)\n",
            "Requirement already satisfied: sympy in /usr/local/lib/python3.10/dist-packages (from torch<=2.2.2,>=2.0.0->lmdeploy) (1.13.1)\n",
            "Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from torch<=2.2.2,>=2.0.0->lmdeploy) (3.3)\n",
            "Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from torch<=2.2.2,>=2.0.0->lmdeploy) (3.1.4)\n",
            "Requirement already satisfied: fsspec in /usr/local/lib/python3.10/dist-packages (from torch<=2.2.2,>=2.0.0->lmdeploy) (2023.6.0)\n",
            "Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch<=2.2.2,>=2.0.0->lmdeploy)\n",
            "  Using cached nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)\n",
            "Collecting nvidia-cuda-runtime-cu12 (from lmdeploy)\n",
            "  Using cached nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)\n",
            "Collecting nvidia-cuda-cupti-cu12==12.1.105 (from torch<=2.2.2,>=2.0.0->lmdeploy)\n",
            "  Using cached nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.6 kB)\n",
            "Collecting nvidia-cudnn-cu12==8.9.2.26 (from torch<=2.2.2,>=2.0.0->lmdeploy)\n",
            "  Using cached nvidia_cudnn_cu12-8.9.2.26-py3-none-manylinux1_x86_64.whl.metadata (1.6 kB)\n",
            "Collecting nvidia-cublas-cu12 (from lmdeploy)\n",
            "  Using cached nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)\n",
            "Collecting nvidia-cufft-cu12==11.0.2.54 (from torch<=2.2.2,>=2.0.0->lmdeploy)\n",
            "  Using cached nvidia_cufft_cu12-11.0.2.54-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)\n",
            "Collecting nvidia-curand-cu12 (from lmdeploy)\n",
            "  Using cached nvidia_curand_cu12-10.3.2.106-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)\n",
            "Collecting nvidia-cusolver-cu12==11.4.5.107 (from torch<=2.2.2,>=2.0.0->lmdeploy)\n",
            "  Using cached nvidia_cusolver_cu12-11.4.5.107-py3-none-manylinux1_x86_64.whl.metadata (1.6 kB)\n",
            "Collecting nvidia-cusparse-cu12==12.1.0.106 (from torch<=2.2.2,>=2.0.0->lmdeploy)\n",
            "  Using cached nvidia_cusparse_cu12-12.1.0.106-py3-none-manylinux1_x86_64.whl.metadata (1.6 kB)\n",
            "Collecting nvidia-nccl-cu12 (from lmdeploy)\n",
            "  Downloading nvidia_nccl_cu12-2.19.3-py3-none-manylinux1_x86_64.whl.metadata (1.8 kB)\n",
            "Collecting nvidia-nvtx-cu12==12.1.105 (from torch<=2.2.2,>=2.0.0->lmdeploy)\n",
            "  Using cached nvidia_nvtx_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.7 kB)\n",
            "Collecting nvidia-nvjitlink-cu12 (from nvidia-cusolver-cu12==11.4.5.107->torch<=2.2.2,>=2.0.0->lmdeploy)\n",
            "  Downloading nvidia_nvjitlink_cu12-12.5.82-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)\n",
            "Collecting starlette<0.38.0,>=0.37.2 (from fastapi->lmdeploy)\n",
            "  Downloading starlette-0.37.2-py3-none-any.whl.metadata (5.9 kB)\n",
            "Collecting fastapi-cli>=0.0.2 (from fastapi->lmdeploy)\n",
            "  Downloading fastapi_cli-0.0.4-py3-none-any.whl.metadata (7.0 kB)\n",
            "Collecting httpx>=0.23.0 (from fastapi->lmdeploy)\n",
            "  Downloading httpx-0.27.0-py3-none-any.whl.metadata (7.2 kB)\n",
            "Collecting python-multipart>=0.0.7 (from fastapi->lmdeploy)\n",
            "  Downloading python_multipart-0.0.9-py3-none-any.whl.metadata (2.5 kB)\n",
            "Collecting email_validator>=2.0.0 (from fastapi->lmdeploy)\n",
            "  Downloading email_validator-2.2.0-py3-none-any.whl.metadata (25 kB)\n",
            "Requirement already satisfied: click>=7.0 in /usr/local/lib/python3.10/dist-packages (from uvicorn->lmdeploy) (8.1.7)\n",
            "Collecting h11>=0.8 (from uvicorn->lmdeploy)\n",
            "  Downloading h11-0.14.0-py3-none-any.whl.metadata (8.2 kB)\n",
            "Requirement already satisfied: six in /usr/local/lib/python3.10/dist-packages (from fire->lmdeploy) (1.16.0)\n",
            "Requirement already satisfied: termcolor in /usr/local/lib/python3.10/dist-packages (from fire->lmdeploy) (2.4.0)\n",
            "Collecting addict (from mmengine-lite->lmdeploy)\n",
            "  Downloading addict-2.4.0-py3-none-any.whl.metadata (1.0 kB)\n",
            "Requirement already satisfied: rich in /usr/local/lib/python3.10/dist-packages (from mmengine-lite->lmdeploy) (13.7.1)\n",
            "Collecting yapf (from mmengine-lite->lmdeploy)\n",
            "  Downloading yapf-0.40.2-py3-none-any.whl.metadata (45 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m45.4/45.4 kB\u001b[0m \u001b[31m4.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hRequirement already satisfied: regex>=2022.1.18 in /usr/local/lib/python3.10/dist-packages (from tiktoken->lmdeploy) (2024.5.15)\n",
            "Requirement already satisfied: requests>=2.26.0 in /usr/local/lib/python3.10/dist-packages (from tiktoken->lmdeploy) (2.31.0)\n",
            "Requirement already satisfied: tokenizers<0.20,>=0.19 in /usr/local/lib/python3.10/dist-packages (from transformers->lmdeploy) (0.19.1)\n",
            "Collecting dnspython>=2.0.0 (from email_validator>=2.0.0->fastapi->lmdeploy)\n",
            "  Downloading dnspython-2.6.1-py3-none-any.whl.metadata (5.8 kB)\n",
            "Requirement already satisfied: idna>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from email_validator>=2.0.0->fastapi->lmdeploy) (3.7)\n",
            "Requirement already satisfied: typer>=0.12.3 in /usr/local/lib/python3.10/dist-packages (from fastapi-cli>=0.0.2->fastapi->lmdeploy) (0.12.3)\n",
            "Requirement already satisfied: anyio in /usr/local/lib/python3.10/dist-packages (from httpx>=0.23.0->fastapi->lmdeploy) (3.7.1)\n",
            "Requirement already satisfied: certifi in /usr/local/lib/python3.10/dist-packages (from httpx>=0.23.0->fastapi->lmdeploy) (2024.7.4)\n",
            "Collecting httpcore==1.* (from httpx>=0.23.0->fastapi->lmdeploy)\n",
            "  Downloading httpcore-1.0.5-py3-none-any.whl.metadata (20 kB)\n",
            "Requirement already satisfied: sniffio in /usr/local/lib/python3.10/dist-packages (from httpx>=0.23.0->fastapi->lmdeploy) (1.3.1)\n",
            "Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->torch<=2.2.2,>=2.0.0->lmdeploy) (2.1.5)\n",
            "Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests>=2.26.0->tiktoken->lmdeploy) (3.3.2)\n",
            "Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests>=2.26.0->tiktoken->lmdeploy) (2.0.7)\n",
            "Collecting httptools>=0.5.0 (from uvicorn[standard]>=0.12.0->fastapi->lmdeploy)\n",
            "  Downloading httptools-0.6.1-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.6 kB)\n",
            "Collecting python-dotenv>=0.13 (from uvicorn[standard]>=0.12.0->fastapi->lmdeploy)\n",
            "  Downloading python_dotenv-1.0.1-py3-none-any.whl.metadata (23 kB)\n",
            "Collecting uvloop!=0.15.0,!=0.15.1,>=0.14.0 (from uvicorn[standard]>=0.12.0->fastapi->lmdeploy)\n",
            "  Downloading uvloop-0.19.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.9 kB)\n",
            "Collecting watchfiles>=0.13 (from uvicorn[standard]>=0.12.0->fastapi->lmdeploy)\n",
            "  Downloading watchfiles-0.22.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.9 kB)\n",
            "Collecting websockets>=10.4 (from uvicorn[standard]>=0.12.0->fastapi->lmdeploy)\n",
            "  Downloading websockets-12.0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.6 kB)\n",
            "Requirement already satisfied: markdown-it-py>=2.2.0 in /usr/local/lib/python3.10/dist-packages (from rich->mmengine-lite->lmdeploy) (3.0.0)\n",
            "Requirement already satisfied: pygments<3.0.0,>=2.13.0 in /usr/local/lib/python3.10/dist-packages (from rich->mmengine-lite->lmdeploy) (2.16.1)\n",
            "Requirement already satisfied: mpmath<1.4,>=1.1.0 in /usr/local/lib/python3.10/dist-packages (from sympy->torch<=2.2.2,>=2.0.0->lmdeploy) (1.3.0)\n",
            "Requirement already satisfied: importlib-metadata>=6.6.0 in /usr/local/lib/python3.10/dist-packages (from yapf->mmengine-lite->lmdeploy) (8.0.0)\n",
            "Requirement already satisfied: platformdirs>=3.5.1 in /usr/local/lib/python3.10/dist-packages (from yapf->mmengine-lite->lmdeploy) (4.2.2)\n",
            "Requirement already satisfied: tomli>=2.0.1 in /usr/local/lib/python3.10/dist-packages (from yapf->mmengine-lite->lmdeploy) (2.0.1)\n",
            "Requirement already satisfied: exceptiongroup in /usr/local/lib/python3.10/dist-packages (from anyio->httpx>=0.23.0->fastapi->lmdeploy) (1.2.2)\n",
            "Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.10/dist-packages (from importlib-metadata>=6.6.0->yapf->mmengine-lite->lmdeploy) (3.19.2)\n",
            "Requirement already satisfied: mdurl~=0.1 in /usr/local/lib/python3.10/dist-packages (from markdown-it-py>=2.2.0->rich->mmengine-lite->lmdeploy) (0.1.2)\n",
            "Requirement already satisfied: shellingham>=1.3.0 in /usr/local/lib/python3.10/dist-packages (from typer>=0.12.3->fastapi-cli>=0.0.2->fastapi->lmdeploy) (1.5.4)\n",
            "Downloading lmdeploy-0.5.1-cp310-cp310-manylinux2014_x86_64.whl (72.8 MB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m72.8/72.8 MB\u001b[0m \u001b[31m27.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading peft-0.11.1-py3-none-any.whl (251 kB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m251.6/251.6 kB\u001b[0m \u001b[31m17.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading torch-2.2.2-cp310-cp310-manylinux1_x86_64.whl (755.5 MB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m755.5/755.5 MB\u001b[0m \u001b[31m2.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hUsing cached nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.whl (410.6 MB)\n",
            "Using cached nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (823 kB)\n",
            "Using cached nvidia_curand_cu12-10.3.2.106-py3-none-manylinux1_x86_64.whl (56.5 MB)\n",
            "Downloading nvidia_nccl_cu12-2.19.3-py3-none-manylinux1_x86_64.whl (166.0 MB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m166.0/166.0 MB\u001b[0m \u001b[31m3.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading triton-2.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (167.9 MB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m167.9/167.9 MB\u001b[0m \u001b[31m4.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hUsing cached nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (14.1 MB)\n",
            "Using cached nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (23.7 MB)\n",
            "Using cached nvidia_cudnn_cu12-8.9.2.26-py3-none-manylinux1_x86_64.whl (731.7 MB)\n",
            "Using cached nvidia_cufft_cu12-11.0.2.54-py3-none-manylinux1_x86_64.whl (121.6 MB)\n",
            "Using cached nvidia_cusolver_cu12-11.4.5.107-py3-none-manylinux1_x86_64.whl (124.2 MB)\n",
            "Using cached nvidia_cusparse_cu12-12.1.0.106-py3-none-manylinux1_x86_64.whl (196.0 MB)\n",
            "Using cached nvidia_nvtx_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (99 kB)\n",
            "Downloading torchvision-0.17.2-cp310-cp310-manylinux1_x86_64.whl (6.9 MB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m6.9/6.9 MB\u001b[0m \u001b[31m51.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading einops-0.8.0-py3-none-any.whl (43 kB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m43.2/43.2 kB\u001b[0m \u001b[31m3.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading fastapi-0.111.1-py3-none-any.whl (92 kB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m92.2/92.2 kB\u001b[0m \u001b[31m8.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading uvicorn-0.30.3-py3-none-any.whl (62 kB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m62.8/62.8 kB\u001b[0m \u001b[31m5.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading mmengine_lite-0.10.4-py3-none-any.whl (451 kB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m451.7/451.7 kB\u001b[0m \u001b[31m34.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading pynvml-11.5.3-py3-none-any.whl (53 kB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m53.1/53.1 kB\u001b[0m \u001b[31m4.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading shortuuid-1.0.13-py3-none-any.whl (10 kB)\n",
            "Downloading tiktoken-0.7.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.1/1.1 MB\u001b[0m \u001b[31m60.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading email_validator-2.2.0-py3-none-any.whl (33 kB)\n",
            "Downloading fastapi_cli-0.0.4-py3-none-any.whl (9.5 kB)\n",
            "Downloading h11-0.14.0-py3-none-any.whl (58 kB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m58.3/58.3 kB\u001b[0m \u001b[31m5.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading httpx-0.27.0-py3-none-any.whl (75 kB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m75.6/75.6 kB\u001b[0m \u001b[31m7.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading httpcore-1.0.5-py3-none-any.whl (77 kB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m77.9/77.9 kB\u001b[0m \u001b[31m8.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading python_multipart-0.0.9-py3-none-any.whl (22 kB)\n",
            "Downloading starlette-0.37.2-py3-none-any.whl (71 kB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m71.9/71.9 kB\u001b[0m \u001b[31m6.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading addict-2.4.0-py3-none-any.whl (3.8 kB)\n",
            "Downloading yapf-0.40.2-py3-none-any.whl (254 kB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m254.7/254.7 kB\u001b[0m \u001b[31m22.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading dnspython-2.6.1-py3-none-any.whl (307 kB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m307.7/307.7 kB\u001b[0m \u001b[31m26.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading httptools-0.6.1-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (341 kB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m341.4/341.4 kB\u001b[0m \u001b[31m29.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading python_dotenv-1.0.1-py3-none-any.whl (19 kB)\n",
            "Downloading uvloop-0.19.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.4 MB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m3.4/3.4 MB\u001b[0m \u001b[31m77.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading watchfiles-0.22.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2 MB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.2/1.2 MB\u001b[0m \u001b[31m53.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading websockets-12.0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (130 kB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m130.2/130.2 kB\u001b[0m \u001b[31m11.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading nvidia_nvjitlink_cu12-12.5.82-py3-none-manylinux2014_x86_64.whl (21.3 MB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m21.3/21.3 MB\u001b[0m \u001b[31m66.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hBuilding wheels for collected packages: fire\n",
            "  Building wheel for fire (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
            "  Created wheel for fire: filename=fire-0.6.0-py2.py3-none-any.whl size=117029 sha256=53dfa0d779abbe27e56283c408179a07ad278c57c3c92ddeb7b3dfab2378f834\n",
            "  Stored in directory: /root/.cache/pip/wheels/d6/6d/5d/5b73fa0f46d01a793713f8859201361e9e581ced8c75e5c6a3\n",
            "Successfully built fire\n",
            "Installing collected packages: addict, websockets, uvloop, triton, shortuuid, python-multipart, python-dotenv, pynvml, nvidia-nvtx-cu12, nvidia-nvjitlink-cu12, nvidia-nccl-cu12, nvidia-curand-cu12, nvidia-cufft-cu12, nvidia-cuda-runtime-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-cupti-cu12, nvidia-cublas-cu12, httptools, h11, fire, einops, dnspython, yapf, watchfiles, uvicorn, tiktoken, starlette, nvidia-cusparse-cu12, nvidia-cudnn-cu12, httpcore, email_validator, nvidia-cusolver-cu12, mmengine-lite, httpx, torch, fastapi-cli, torchvision, fastapi, peft, lmdeploy\n",
            "  Attempting uninstall: triton\n",
            "    Found existing installation: triton 2.3.1\n",
            "    Uninstalling triton-2.3.1:\n",
            "      Successfully uninstalled triton-2.3.1\n",
            "  Attempting uninstall: torch\n",
            "    Found existing installation: torch 2.3.1+cu121\n",
            "    Uninstalling torch-2.3.1+cu121:\n",
            "      Successfully uninstalled torch-2.3.1+cu121\n",
            "  Attempting uninstall: torchvision\n",
            "    Found existing installation: torchvision 0.18.1+cu121\n",
            "    Uninstalling torchvision-0.18.1+cu121:\n",
            "      Successfully uninstalled torchvision-0.18.1+cu121\n",
            "\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n",
            "torchaudio 2.3.1+cu121 requires torch==2.3.1, but you have torch 2.2.2 which is incompatible.\n",
            "torchtext 0.18.0 requires torch>=2.3.0, but you have torch 2.2.2 which is incompatible.\u001b[0m\u001b[31m\n",
            "\u001b[0mSuccessfully installed addict-2.4.0 dnspython-2.6.1 einops-0.8.0 email_validator-2.2.0 fastapi-0.111.1 fastapi-cli-0.0.4 fire-0.6.0 h11-0.14.0 httpcore-1.0.5 httptools-0.6.1 httpx-0.27.0 lmdeploy-0.5.1 mmengine-lite-0.10.4 nvidia-cublas-cu12-12.1.3.1 nvidia-cuda-cupti-cu12-12.1.105 nvidia-cuda-nvrtc-cu12-12.1.105 nvidia-cuda-runtime-cu12-12.1.105 nvidia-cudnn-cu12-8.9.2.26 nvidia-cufft-cu12-11.0.2.54 nvidia-curand-cu12-10.3.2.106 nvidia-cusolver-cu12-11.4.5.107 nvidia-cusparse-cu12-12.1.0.106 nvidia-nccl-cu12-2.19.3 nvidia-nvjitlink-cu12-12.5.82 nvidia-nvtx-cu12-12.1.105 peft-0.11.1 pynvml-11.5.3 python-dotenv-1.0.1 python-multipart-0.0.9 shortuuid-1.0.13 starlette-0.37.2 tiktoken-0.7.0 torch-2.2.2 torchvision-0.17.2 triton-2.2.0 uvicorn-0.30.3 uvloop-0.19.0 watchfiles-0.22.0 websockets-12.0 yapf-0.40.2\n"
          ]
        }
      ],
      "id": "installation"
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "wzir_EMHwOns"
      },
      "source": [
        "## Loading the Model and Starting Inference\n",
        "\n",
        "We will use the Yi-1.5-6B-Chat model for this demonstration. Below are the memory and disk usage details for this model:\n",
        "\n",
        "| Model | GPU Memory Usage | Disk Usage |\n",
        "|-------|------------------|------------|\n",
        "| Yi-1.5-6B-Chat | 20.3G | 18G |\n",
        "\n",
        "Execute the following command to start inference:\n",
        "\n",
        "⚠️ Ensure your computer has enough GPU memory and disk space."
      ],
      "id": "wzir_EMHwOns"
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "inference",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "4684bdc0-0b76-4495-96cf-f49472e1cd7f"
      },
      "source": [
        "!lmdeploy chat 01-ai/Yi-1.5-6B-Chat"
      ],
      "execution_count": 3,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "\rFetching 14 files:   0% 0/14 [00:00<?, ?it/s]\rFetching 14 files: 100% 14/14 [00:00<00:00, 75475.91it/s]\n",
            "chat_template_config:\n",
            "ChatTemplateConfig(model_name='yi', system=None, meta_instruction=None, eosys=None, user=None, eoh=None, assistant=None, eoa=None, separator=None, capability='chat', stop_words=None)\n",
            "engine_cfg:\n",
            "TurbomindEngineConfig(model_name='/root/.cache/huggingface/hub/models--01-ai--Yi-1.5-6B-Chat/snapshots/15fc040a9c9a81098f05ded04e6e519ed91b4f37', model_format=None, tp=1, session_len=2048, max_batch_size=1, cache_max_entry_count=0.8, cache_block_seq_len=64, enable_prefix_caching=False, quant_policy=0, rope_scaling_factor=0.0, use_logn_attn=False, download_dir=None, revision=None, max_prefill_token_num=8192, num_tokens_per_iter=0, max_prefill_iters=1)\n",
            "[WARNING] gemm_config.in is not found; using default GEMM algo\n",
            "\n",
            "double enter to end input >>> Tell me a joke.\n",
            "\n",
            "<|im_start|>user\n",
            "Tell me a joke.<|im_end|>\n",
            "<|im_start|>assistant\n",
            "Why don't skeletons fight each other?\n",
            "\n",
            "They don't have the guts.\n",
            "\n",
            "double enter to end input >>> \n",
            "^C\n"
          ]
        }
      ],
      "id": "inference"
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Sb_4HTnSwOnt"
      },
      "source": [
        "That's it! You have successfully performed inference with the Yi-1.5-6B-Chat model using lmdeploy. Feel free to try different models or tweak the configuration parameters to explore more possibilities. Enjoy your experimentation!"
      ],
      "id": "Sb_4HTnSwOnt"
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.8.8"
    },
    "colab": {
      "provenance": [],
      "machine_shape": "hm",
      "gpuType": "L4"
    },
    "accelerator": "GPU"
  },
  "nbformat": 4,
  "nbformat_minor": 5
}